PaSE : Parallel Speedup Estimation Framework for Network-on-Chip Based Multi-core Systems
نویسندگان
چکیده
The massive integration of cores in multi-core system has enabled chip designer to design systems while meeting the power-performance demands of the applications. However, full-system simulations traditionally used to evaluate the speedup with these systems are computationally expensive and time consuming. On the other hand, analytical speedup models such as Amdahl’s law are powerful and fast ways to calculate the achievable speedup of these systems. However, Amdahl’s Law disregards the communication among the cores that play a vital role in defining the achievable speedup with the multi-core systems. To bridge this gap, in this work, we present PaSE a parallel speedup estimation framework for multi-core systems that considers the latency of the Network-on-Chip (NoC). To accurately capture the latency of the NoC, we propose a queuing theory based analytical model. Using our proposed PaSE framework, We conduct a speedup analysis for multicore system with real application based traffic i.g. Matrix Multiplication. the multiplication of two [32x32] matrices is considered in NoC based multi-core system with respect to three different cases ideal-core case, integer case (i.g. matrix elements are integer number), and denormal case (i.g. matrix elements are denormal number, NaNs, or infinity). From this analysis, we show how the system size, Network-on-Chip NoC architecture, and the computation to communication (C -to-C ) ratio effect the achievable speedup. To sum up, instead of the simulation based performance estimation, our PaSE framework can be utilized as a design guideline i.g. it is possible to use it to understand the optimal multi-core system-size for certain applications. Thus, this model can reduce the design time and effort of such NoC based multi-core systems.
منابع مشابه
Design of a Low-Latency Router Based on Virtual Output Queuing and Bypass Channels for Wireless Network-on-Chip
Wireless network-on-chip (WiNoC) is considered as a novel approach for designing future multi-core systems. In WiNoCs, wireless routers (WRs) utilize high-bandwidth wireless links to reduce the transmission delay between the long distance nodes. When the network traffic loads increase, a large number of packets will be sent into the wired and wireless links and can...
متن کاملReliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)
Nowadays, faults and failures are increasing especially in complex systems such as Network-on-Chip (NoC) based Systems-on-a-Chip due to the increasing susceptibility and decreasing feature sizes. On the other hand, fault-tolerant routing algorithms have an evident effect on tolerating permanent faults and improving the reliability of a Network-on-Chip based system. This paper presents reliabili...
متن کاملMulti MicroBlaze System for Parallel Computing
Embedded systems need more computational power to satisfy today’s applications’ needs, like audio/video encoding/decoding, image processing, etc. An option for increasing the computational power of a system is to include various microprocessors and make them work in parallel. This paper presents a study of the viability of making a multiprocessor system on a chip (MPSoC) using the MicroBlaze so...
متن کاملParallelization of Stochastic Estimation Algorithms on Multicore Computational Platforms
The main part of this licentiate thesis concerns parallelization of recursive estimation methods, both linear and nonlinear. Recursive estimation deals with the problem of extracting information about parameters or states of a dynamical system, given noisy measurements of the system output and plays a central role in many applications of signal processing, system identification, and automatic c...
متن کاملA Multilevel Parallelization Framework for High-Order Stencil Computations
Stencil based computation on structured grids is a common kernel to broad scientific applications. The order of stencils increases with the required precision, and it is a challenge to optimize such high-order stencils on multicore architectures. Here, we propose a multilevel parallelization framework that combines: (1) inter-node parallelism by spatial decomposition; (2) intra-chip parallelism...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017